Delft University of Technology Projected estimators for robust semi-supervised classification
نویسندگان
چکیده
For semi-supervised techniques to be applied safely in practice we at least want methods to outperform their supervised counterparts.We study this question for classification using the well-known quadratic surrogate loss function. Unlike other approaches to semisupervised learning, the procedure proposed in this work does not rely on assumptions that are not intrinsic to the classifier at hand. Using a projection of the supervised estimate onto a set of constraints imposed by the unlabeled data, we find we can safely improve over the supervised solution in terms of this quadratic loss. More specifically, we prove that, measured on the labeled and unlabeled training data, this semi-supervised procedure never gives a lower quadratic loss than the supervised alternative. To our knowledge this is the first approach that offers such strong, albeit conservative, guarantees for improvement over the supervised solution. The characteristics of our approach are explicated using benchmark datasets to further understand the similarities and differences between the quadratic loss criterion used in the theoretical results and the classification accuracy typically considered in practice.
منابع مشابه
Semi-Supervised Learning Based Prediction of Musculoskeletal Disorder Risk
This study explores a semi-supervised classification approach using random forest as a base classifier to classify the low-back disorders (LBDs) risk associated with the industrial jobs. Semi-supervised classification approach uses unlabeled data together with the small number of labelled data to create a better classifier. The results obtained by the proposed approach are compared with those o...
متن کاملA Robust Dispersion Control Chart Based on M-estimate
Process control charts are proven techniques for improving quality. Specifying the control limits is the most important step in designing a control chart. The presence of outliers may extremely affect the estimates of parameters using classical methods. Robust estimators which are not affected by outliers or the small departures from the model assumptions are applied in this paper to specify th...
متن کاملA Two-Phase Robust Estimation of Process Dispersion Using M-estimator
Parameter estimation is the first step in constructing any control chart. Most estimators of mean and dispersion are sensitive to the presence of outliers. The data may be contaminated by outliers either locally or globally. The exciting robust estimators deal only with global contamination. In this paper a robust estimator for dispersion is proposed to reduce the effect of local contamination ...
متن کاملDetecting Concept Drift in Data Stream Using Semi-Supervised Classification
Data stream is a sequence of data generated from various information sources at a high speed and high volume. Classifying data streams faces the three challenges of unlimited length, online processing, and concept drift. In related research, to meet the challenge of unlimited stream length, commonly the stream is divided into fixed size windows or gradual forgetting is used. Concept drift refer...
متن کاملRobust Method for E-Maximization and Hierarchical Clustering of Image Classification
We developed a new semi-supervised EM-like algorithm that is given the set of objects present in eachtraining image, but does not know which regions correspond to which objects. We have tested thealgorithm on a dataset of 860 hand-labeled color images using only color and texture features, and theresults show that our EM variant is able to break the symmetry in the initial solution. We compared...
متن کامل